ggplot, mapview and leafletsf packagesf, ggplot, mapview, and leafletWe’ve spent time exploring how to wrangle data tables to reveal new information. An important step forward came when we joined two datasets on a common attribute (e.g. joining physical and nutrient attribute tables for LTER lakes), which allowed us to examine relationships across these two sources.
With GIS and spatial analysis, we don’t necessarily need a common attribute to join tables from different sources; instead, we can use location. Thus, if we know the location of our NTL-LTER lakes, we can compute their distances from the EPA air quality monitoring locations to explore possible relationships between the two data sets. Thus, location is quite a powerful ally in environmental data analysis.
In this lesson, we explore how data can be represented spatially, and how we can use location to query and analyze data. We’ll introduce some useful R libraries for handling, analyzing, and visualizing spatial datasets.
Spatial data are modeled in two formats: vector and raster. Today, we’ll be concentrating on vector data, returning to raster a bit later. With vector data, features are represented with a combination of three elements: geometry, attributes, and a coordinate reference system, or crs. Let’s talk about the first two and then come back to the crs a bit later.
With vector data, a spatial feature takes the form of a point, line, or a polygon, collectively referred to as its geometry.
Question: Think of some features you might see on a map. Would they best be represented as a point, line, or polygon? Can some features be represented by more than one type? Can you think of any features that you couldn’t map with a point/line/polygon?
In addition to its geometry, attributes are also linked to spatial features. Attributes hold all the non-geometric information associated with the feature: an ID or name, some measurements, a collection date, etc.
When we combine geometries with attributes we get a spatially enabled dataframe, and we actually have a few of these in the datasets we’ve been working with: the EPA air quality datasets contain two fields, SITE_LATITUDE and SITE_LONGITUDE which combine to define a point geometry. And with that, we can easily plot our data geographically, i.e. “map” the data:
#Import the tidyverse libary
library(tidyverse, quietly = TRUE)
## -- Attaching packages -------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.6
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts ----------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
#Read the data 2017 Ozone data
EPAair_PM25_NC2018_raw <- read.csv("./Data/Raw/EPAair_PM25_NC2018_raw.csv")
#Reduce our data to just one record for each location, computing mean and max daily AQI values
EPAair_PM_avg <- EPAair_PM25_NC2018_raw %>% # FRA: to get togther the info of one point
group_by(Site.Name, COUNTY, SITE_LATITUDE, SITE_LONGITUDE) %>%
summarize(meanPM = mean(Daily.Mean.PM2.5.Concentration),
maxPM = max(Daily.Mean.PM2.5.Concentration))
#Plot the data using longitude and latitude as X and Y
ggplot(EPAair_PM_avg, aes(x=SITE_LONGITUDE, y=SITE_LATITUDE)) +
geom_point(aes(color=meanPM), size=4) + coord_equal() # coord equal gives the same scale.
This is more of a proof of concept than a useful map. We can sort of visualize spatial patterns in the data, but without context of scale or location, it’s somewhat difficult. We can greatly improve this map with the help of some mapping packages for R…
See https://rdrr.io/cran/mapview/man/mapviewOptions.html for more info.
#(Install and) Load the mapView package
#install.packages('mapview')
library(mapview) # FRA: good to see things on a map
## Warning: package 'mapview' was built under R version 3.5.3
#Set the available map backgrounds
mapviewOptions(basemaps = c('OpenStreetMap','Esri.WorldImagery','Stamen.Toner','Stamen.Watercolor'))
#Create a mapView map from our EPA data
myMap = mapview(EPAair_PM_avg,
xcol = "SITE_LONGITUDE",
ycol = "SITE_LATITUDE",
crs = 4269,
grid = FALSE)
#Show the map
myMap
#Save the map (if you want, and if "phantomJS" is installed)
#mapshot(myMap, file='EPA_SiteMap.png') #Save to a PNG file
#mapshot(myMap, file='EPA_SiteMap.html') #Save to an HTML file
Much better, no? And interactive! But while MapView loosely follows a familiar ggplot format, it’s not quite as powerful as other formats, such as Leaflet…
See http://rstudio.github.io/leaflet/ for more info
#Import libraries
#install.packages('leaflet')
library(leaflet)
## Warning: package 'leaflet' was built under R version 3.5.3
#Create the map and add Tiles
myMap <- leaflet(data=EPAair_PM_avg) %>%
addTiles() %>% # tiles are the background maps (there is also ggmaps)
addCircleMarkers(~SITE_LONGITUDE,~SITE_LATITUDE,
radius=(~meanPM*2),
stroke = FALSE,
fillOpacity = 0.3,
popup = ~as.character(`Site.Name`))
# FRA: what field you want to pop up when you click the point
# Show the map
myMap
There’s much more we can do to make these maps prettier and more powerful, but we’ll come back to that. But take time to notice the power of mapping: like previous plots, we are able to interpret patterns in our dataset, but in geographically mapping our data, we can visualize these patterns in a much richer context by viewing them next to other spatial datasets (e.g. our basemaps). Ok, let’s continue talking about spatial features in the vector model…
In our EPA example, we used a coordinate pair (e.g. Latitude/Longitude) to represent the location of our point features. But how might line and polygon features represented?
Well, a line is simply a sequence of points (often called vertices in this context), and so line features can be represented as a series of point coordinates.
And polygons are just areas enclosed by a line, so polygon features can be represented as as a series of coordinate pairs where the last coordinate pair is the same as the first. [Some polygons can have holes in the middle, which we can handle too, but let’s keep it simple for now…]
Figure of different geometries.
Thinking back to our EPA dataset, however, it seems quite a bit more cumbersome to store collections of coordinate pairs for line and polygon features compared to our simple SITE_LATITUDE and SITE_LONGITUDE columns.
So how are geometries usually stored and dealt with in R? Enter the sf package.
sf PACKAGEWhile the sp package was the first to handle vector spatial data, it was replaced with the sf package in 2016, and that made our lives easier. Short for “Simple Features”, the sf package includes a structured format for storing geometries in spatially-enabled dataframes, solving the issue we just mentioned about storing gobs of coordinate pairs.
The sf package also enables us to read from and write to a number of spatial data formats (using the open source GDAL engine) and perform a host of spatial operations with these spatial features (using the open source GEOS engine). It also provides tools for dealing with that third component of geospatial data that we haven’t yet discussed: the coordinate reference system.
Let’s explore a few examples to familiarize ourselves with the sf package.
First, let’s convert our EPA dataframe to a simple features (“sf”) dataframe and explore what’s different:
#Import the sf library
#install.packages('sf')
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
#Convert the dataset to a spatially enabled "sf" data frame
EPAair_PM_avg_sf <- st_as_sf(EPAair_PM_avg,coords = c('SITE_LONGITUDE','SITE_LATITUDE'),crs=4326) # crs = coordinate reference system. later
#Do you see a new column name?
colnames(EPAair_PM_avg_sf)
## [1] "Site.Name" "COUNTY" "meanPM" "maxPM" "geometry"
#What is the class of the values in the geometry column?
class(EPAair_PM_avg_sf$geometry)
## [1] "sfc_POINT" "sfc"
#What does this look like
head(EPAair_PM_avg_sf)
## Simple feature collection with 6 features and 4 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -83.44213 ymin: 34.36417 xmax: -76.20717 ymax: 35.6062
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 6 x 5
## Site.Name COUNTY meanPM maxPM geometry
## <fct> <fct> <dbl> <dbl> <POINT [°]>
## 1 "" Hyde 4.1 10.9 (-76.20717 35.45109)
## 2 Blackstone Lee 9.21 27.1 (-79.2887 35.4325)
## 3 Board Of Ed. Bldg. Buncombe 6.61 22.3 (-82.5844 35.6062)
## 4 Bryson City Swain 7.63 25 (-83.44213 35.43477)
## 5 Candor: EPA CASTNet Site Montgomery 6.74 34.2 (-79.83661 35.2632)
## 6 Castle Hayne New Hanover 3.63 15.5 (-77.83861 34.36417)
#Plot the geometry...
plot(EPAair_PM_avg_sf$geometry)
#Plot everything
plot(EPAair_PM_avg_sf)
#Plot everything -- with MapView
mapview(EPAair_PM_avg_sf)
#Plot a single variable in MapView
mapview(EPAair_PM_avg_sf['meanPM'])
#With geometries now available in a column, we can use ggplot with the geom_sf object
ggplot() +
geom_sf(data=EPAair_PM_avg_sf, aes(color=meanPM), size=4)
So, we see that our “sf” dataframe works much like our familiar dataframe, only it has a new column containing geometries for each record. This pretty much sums up what a GIS is: a familiar table of records and attributes (i.e. an “Information System”), but with one attribute that includes a geometry that allows us to incorporate geography into our analysis (i.e. a “Geographic Information System”)!
With our EPA data now spatially enabled, we can perform spatial analysis with the data. A simple analysis to buffer points a certain distance, done with the st_buffer operation.
#Buffer the Durham Armory point 0.1 degrees
Durham_buffer <- EPAair_PM_avg_sf %>%
filter(Site.Name == 'Durham Armory') %>%
st_buffer(0.1) # FRA: 0.1 degree.
## Warning in st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle =
## endCapStyle, : st_buffer does not correctly buffer longitude/latitude data
## dist is assumed to be in decimal degrees (arc_degrees).
#View the result
mapView(Durham_buffer)
In running the above, you get an warning and the shape looks elliptical, not circular. What’s up? Well, it’s time to chat about coordinate reference systems…
So far, all our spatial data have been using lat/long coordinates. This is fine for plotting any many other operations, but lat/long coordinates are spherical coordinates (i.e. angles), and the geometry we are used to is done on planar coordinates (i.e. lengths). Going between the two is a tricky matter because:
You can’t flatten a sphere into a plane without distorting area, distance, shape, and/or direction.
The earth is not a perfect sphere to begin with.
The first issue is handled with projecting the data. (Think of putting a light source in the middle of your sphere and projecting its surface onto a wall…). Various methods of projecting data exist, each tailored to minimize distortion of a particular type (area|distance|shape|direction) and location.
Projecting data involves a lot of math, but there are equations for that. Still, what further complicates the matter is point 2 above: the earth is not a perfect sphere, but rather an ellipsoid and an irregular one at that. Over time, people have devised various models to depict the true flattened shape of the earth. These are called spheroids (or sometimes ellipses). And on top of that, people have devised additional models to incorporate local deviations from these spheroid models. These are called datums.
The bottom line is that we have numerous different coordinate reference systems. We have geographic coordinate systems (GCS) in which coordinates remain in geographic (or angular) units, and projected coordinate systems (PCS) in which coordinates are in planar units. Both GCS and PCS require a defined spheorid and datum for depicting the shape of the earth. And PCS additionally require information on how the surface of this sphere is projected onto a plane and the location of the origin of the coordinate system. Which CRS to choose depends on the location and extent of your study, and whether is most important to maintain correct areas, shapes, distances, or directions.
Coordinate reference systems can indeed be confusing, but they are “a necessary evil” as they allow us to combine spatial datasets from various sources as long as we can successfully translate our data from one CRS to another. Fortunately, R and other geospatial tools are there to help us with that; we just have to keep our accounting straight.
The website http://spatialreference.org/ lists hundreds of standard CRS along with a map and description of where it’s appropriate. For example, a useful one in North Carolina is UTM Zone 17N-NAD 83. Note also that it’s EPSG code is 26917; this EPSG code one way we can assign a CRS to our data. If you click on the the “Well Known Text as HTML” link, it will reveal all the specific associated with that CRS… (Another site listing coordinate reference systems is http://www.epsg.org/)
Some CRS, however, do not have an associated EPSG code, and for those, we can use the “proj4string”, which is a long hand form of the projection information. Just to make it more confusing, some coordinate systems have a “WKT” or “Well known text” code.Mostly, these are all interchangeable and the important bit here is that you understand the need for and requirements of associating a CRS with a spatial dataset. I found this site gives a useful overview of the nuances of coordinate systems: https://bit.ly/2XUGMyX
#Recall what the CRS is of our EPA dataset
st_crs(EPAair_PM_avg_sf) # FRA: EPSG:4326. Datum:WGS84
## Coordinate Reference System:
## EPSG: 4326
## proj4string: "+proj=longlat +datum=WGS84 +no_defs"
Now look that EPSG up on http://spatialreference.org. * What coordinate system this EPSG is associated with? * What is the extent of this projection? * Is this a geographic or projected coordinate system? * What Datum does it use?
Applying our knowledge of CRS, let’s transform our EPA sites from its native geographic coordinate system to a projected one, namely UTM Zone 17N and repeat what we attempted earlier. Then we’ll revisit the buffer.
#Transform the entire EPA dataset to the UTM Zone 17N crs
EPAair_PM_avg_sf_UTM <- st_transform(EPAair_PM_avg_sf, crs=26917)
# there is a web page to look the number crs (spatial reference search)
#Rebuffer the Durham Armory site. Is it now circular?
DA_UTM_buffer <- EPAair_PM_avg_sf_UTM %>%
filter(Site.Name == 'Durham Armory') %>% # Filter for the Durham Armory Site
st_buffer(2000) # Buffer 2000 meters
mapView(DA_UTM_buffer) # View the results FRA: now a circle
As we are now in a planar coordinate system, we can do some more spatial analysis… We’ll examine one quick one - a distance analysis - as a quick preview, then more later after we add some other datasets…
#Compute the distance between the Durham Armory site and all other sites
Distances_to_DASite <- EPAair_PM_avg_sf_UTM %>%
filter(Site.Name == 'Durham Armory') %>% #Subset the Durham Armory site
st_distance(EPAair_PM_avg_sf_UTM) %>% #Compute distances to all other sites
data.frame() %>% t #Transpose the result
#Add new field of distances to Durham Armory site
EPAair_PM_avg_sf <- mutate(EPAair_PM_avg_sf,Dist2DA = Distances_to_DASite)
#Plot to see that it worked: larger = further?
ggplot() +
geom_sf(data=EPAair_PM_avg_sf,aes(size=Dist2DA)) # distance forma given site.
Now that we have a feel for how vector spatial data are stored, let’s dive deeper into what types of spatial analyses we can do. * Reading shapefiles into R with sf (and selecting with filter) * Spatial aggregation with group_by and summarize or st_union * Visualizing multiple datasets * Changing CRS with transform * Attribute joins with merge * Spatial joins * Geometry manipulations * Buffer * Convex hulls * Voronoi polygons * Select polygon by location (buffer and intersect)
sfThe sf package also allows us to read many existing data formats, including ArcGIS ShapeFiles. I’ve added a few shapefiles to our Data folder, one of all US counties and another of 8-digit hydrologic Unit codes (HUCs) for NC….
Below we read in the USA counties shapefile, filtering for just the NC features (NC has a state FIPS code of 37…)
counties_sf<- st_read('./Data/Spatial/cb_2017_us_county_20m.shp') %>%
filter(STATEFP == 37) #Filter for just NC Counties
## Reading layer `cb_2017_us_county_20m' from data source `C:\Users\Felipe\OneDrive - Duke University\1. DUKE\1. Ramos 2 Semestre\EOS-872 Env. Data Analytics\Environmental_Data_Analytics\Data\Spatial\cb_2017_us_county_20m.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3220 features and 9 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## epsg (SRID): 4269
## proj4string: +proj=longlat +datum=NAD83 +no_defs
#CRS (Is this the same as the EPA Dataset?)
st_crs(counties_sf) #crs=4269. we can look up what this means in that webpage
## Coordinate Reference System:
## EPSG: 4269
## proj4string: "+proj=longlat +datum=NAD83 +no_defs"
#Column names
names(counties_sf)
## [1] "STATEFP" "COUNTYFP" "COUNTYNS" "AFFGEOID" "GEOID" "NAME"
## [7] "LSAD" "ALAND" "AWATER" "geometry"
#Reveal the number of features in this dataset
nrow(counties_sf)
## [1] 100
#Reveal the extent of this dataset via the st_bbox() function
st_bbox(counties_sf) # a box around your data
## xmin ymin xmax ymax
## -84.32187 33.85111 -75.45866 36.58812
#View the data
head(counties_sf)
## Simple feature collection with 6 features and 9 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -80.04687 ymin: 34.83372 xmax: -77.1008 ymax: 36.54242
## epsg (SRID): 4269
## proj4string: +proj=longlat +datum=NAD83 +no_defs
## STATEFP COUNTYFP COUNTYNS AFFGEOID GEOID NAME LSAD ALAND
## 1 37 181 01008591 0500000US37181 37181 Vance 06 653705784
## 2 37 107 01008568 0500000US37107 37107 Lenoir 06 1033630023
## 3 37 147 01008578 0500000US37147 37147 Pitt 06 1689619168
## 4 37 081 01008558 0500000US37081 37081 Guilford 06 1672545660
## 5 37 093 01008563 0500000US37093 37093 Hoke 06 1010380313
## 6 37 195 01008596 0500000US37195 37195 Wilson 06 952010950
## AWATER geometry
## 1 42187365 MULTIPOLYGON (((-78.49778 3...
## 2 5900300 MULTIPOLYGON (((-77.81865 3...
## 3 8248766 MULTIPOLYGON (((-77.66513 3...
## 4 30723331 MULTIPOLYGON (((-80.04324 3...
## 5 3981006 MULTIPOLYGON (((-79.45875 3...
## 6 14334531 MULTIPOLYGON (((-78.17534 3...
#Plot the data
mapView(counties_sf)
Now you try: Read in the NC 8-Digit HUC dataset: ‘./Data/Spatial/huc_250k_nc.shp’. What CRS does this dataset use? Is it the same as the counties dataset? What columns are included in this dataset?
#Read the shapefile into an sf dataframe named "huc8_sf"
huc8_sf <- st_read('./Data/Spatial/huc_250k_nc.shp')
## Reading layer `huc_250k_nc' from data source `C:\Users\Felipe\OneDrive - Duke University\1. DUKE\1. Ramos 2 Semestre\EOS-872 Env. Data Analytics\Environmental_Data_Analytics\Data\Spatial\huc_250k_nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 58 features and 10 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 979948.1 ymin: 1255304 xmax: 1833587 ymax: 1739964
## epsg (SRID): NA
## proj4string: +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD27 +units=m +no_defs
#Check the CRS
st_crs(huc8_sf)
## Coordinate Reference System:
## No EPSG code
## proj4string: "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD27 +units=m +no_defs"
#Examine the column names
names(huc8_sf)
## [1] "AREA" "PERIMETER" "HUC250K_" "HUC250K_ID" "HUC_CODE"
## [6] "HUC_NAME" "REG" "SUB" "ACC" "CAT"
## [11] "geometry"
plot(huc8_sf) #all
## Warning: plotting the first 9 out of 10 attributes; use max.plot = 10 to
## plot all
#View the data as a map
plot(huc8_sf['SUB'],
graticule=st_crs(4326), # se ve chueco el grid porque la proyeccion esta mala
axes=TRUE)
plot(huc8_sf['SUB'],
graticule=st_crs(huc8_sf), # ahi se arregla
axes=TRUE)
#View the data as a map
mapView(huc8_sf)
Challenge: Read in the NC 8-Digit HUC dataset again, but this time filter the data so the result only includes the one with a HUC_NAME value of ‘Upper Neuse’.
#Read the shapefile into an sf dataframe
upperNeuse_sf <- st_read('./Data/Spatial/huc_250k_nc.shp') %>%
filter(HUC_NAME == "Upper Neuse")
## Reading layer `huc_250k_nc' from data source `C:\Users\Felipe\OneDrive - Duke University\1. DUKE\1. Ramos 2 Semestre\EOS-872 Env. Data Analytics\Environmental_Data_Analytics\Data\Spatial\huc_250k_nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 58 features and 10 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 979948.1 ymin: 1255304 xmax: 1833587 ymax: 1739964
## epsg (SRID): NA
## proj4string: +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD27 +units=m +no_defs
mapView(upperNeuse_sf)
#Aggregate the data using group_by and summarize, just as you would a non-spatial dataframe
state_sf <- counties_sf %>% # FRA: agregate so all counties in one feature
group_by(STATEFP) %>% # FRA: in this case you dont need this actually
summarize(ALAND = sum(ALAND))
#View the data
mapview(state_sf)
#Alternatively, we could use the `st_union` function
state_sf2 <- st_union(counties_sf) # FRA: alternative
mapview(state_sf2)
Now you try it: Aggregate the HUC data on the SUB attribute, computing the sum of the “AREA” field and view the result.
huc2_sf <- huc8_sf %>% # FRA: agregate so all counties in one feature
group_by(SUB) %>%
summarize(AREA = sum(AREA))
mapview(huc2_sf)
mapview(huc2_sf['AREA'])
When dealing with multiple datasets, it’s best to get them all using the same CRS. This can be done with the st_transform command, supplying the EPSG code of the CRS that you want your data to be in. Currently we have 5 spatial datasets going. Let’s get those all into a consistent CRS.
#Convert all to UTM Zone 17 (crs = 26917)
# EPAair_PM_avg_sf_utm is already in UTM
counties_sf_utm <- st_transform(counties_sf, c=26917)
state_sf_utm <- st_transform(state_sf,c=26917)
huc8s_sf_utm <- st_transform(huc8_sf, c=26917)
huc2_utm = st_transform(huc2_sf, c=26917)
# Convert all to WGS84 (crs=4326)
EPAair_wgs84 <- st_transform(EPAair_PM_avg_sf, c=4326)
counties_WGS84 <- st_transform(counties_sf_utm, c=4326)
state_WGS84 <- st_transform(state_sf,c=4326)
huc8s_WGS84 <- st_transform(huc8_sf,c=4326)
huc2_WGS84<- st_transform(huc2_sf,c=4326)
#Now plot with leaflet: no errors as all layers are in Leaflet's native CRS (WGS84)
leaflet() %>% addTiles() %>%
addPolygons(data=counties_WGS84,weight=1,color='red') %>%
addPolygons(data=huc8s_WGS84,weight=1)
You can plot multiple spatial datasets just as we’ve done with tabular datasets. With spatial data, however, the order of the data you plot is important…
#Wrong order
ggplot() +
geom_sf(data = EPAair_PM_avg_sf_UTM, color='white', size=2) +
geom_sf(data = counties_sf_utm, aes(fill = ALAND), color = 'white') +
geom_sf(data = state_sf_utm, color='red',size=2) +
scale_fill_gradient(low="yellow", high="darkgreen")
#Right order
ggplot() +
geom_sf(data = state_sf_utm, color='red',size=2) +
geom_sf(data = counties_sf_utm, aes(fill = ALAND), color = 'white') +
geom_sf(data = EPAair_PM_avg_sf_UTM, color='blue', size=2) +
scale_fill_gradient(low="yellow", high="darkgreen")
Tip: See http://leaflet-extras.github.io/leaflet-providers/preview for other basemaps
leaflet() %>%
addProviderTiles(providers$Esri.WorldShadedRelief) %>%
addPolygons(data = counties_WGS84, color = "orange", weight = 1, smoothFactor = 0.5,
opacity = 1.0, fillOpacity = 0.5,
fillColor = ~colorQuantile("YlGnBu", ALAND)(ALAND)) %>%
addPolygons(data = huc2_WGS84, color=NA, weight = 2) %>%
addMarkers(data=EPAair_wgs84,popup = ~as.character(`Site.Name`))
m1 <- leaflet() %>%
addTiles() %>%
addPolygons(data = counties_WGS84, color = "orange", weight = 1, smoothFactor = 0.5,
opacity = 1.0, fillOpacity = 0.5,
fillColor = ~colorQuantile("YlOrRd", ALAND)(ALAND)) %>%
addMarkers(data=EPAair_wgs84,popup = ~as.character(`Site.Name`))
m2 <- leaflet() %>%
addProviderTiles(providers$Stamen.TonerHybrid) %>%
addPolygons(data = huc8s_WGS84,weight=0.2,color='red') %>%
addCircleMarkers(data=EPAair_wgs84,
radius=(~meanPM*2),
stroke = FALSE,
fillOpacity = 0.3,
popup = ~as.character(`Site.Name`))
latticeview(m1, m2)
sync(m1,m2)
FRA: More data analysis
#Clip the HUC2 data set by the NC State boundary dataset
huc2_UTM_nc <- st_intersection(huc2_utm,state_sf_utm) #chop off the areas that fall outside
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
mapview(huc2_UTM_nc) # you cab go to cheat sheet to se what else you can do beside intersection
Exercise: Try subsetting the huc2_UTM dataset for SUB = 0302, then clipping counties with that subset.
huc0302 <- huc2_utm %>%
filter(SUB=="0302") #quotes because is a factr
mapview(huc0302)
huc0302_counties <- st_intersection(huc0302,counties_sf_utm)
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
#Map the product, showing values by land area
mapview(huc0302_counties['ALAND'])
mergeTo add more attributes to our spatial features, we can merge records just as we do with non-spatial dataframes, i.e. with the merge command. Here we will read in a table of county census data an join them to our counties_sf data set.
#Read in the demographic data
demog_df <- read.csv('./Data/Spatial/NC_Demography.csv') %>%
mutate(FIPS = as.factor(FIPS)) #Convert the FIPS variable to a factor
#Join the two datasets using "GEOID" from the left dataset and "FIPS" from the right
counties_sf_utm_join <- counties_sf_utm %>%
left_join(y = demog_df,by = c("GEOID" = "FIPS"))
## Warning: Column `GEOID`/`FIPS` joining factors with different levels,
## coercing to character vector
#Check the dimensions
dim(counties_sf_utm)
## [1] 100 10
dim(demog_df)
## [1] 100 15
dim(counties_sf_utm_join)
## [1] 100 24
names(counties_sf_utm)
## [1] "STATEFP" "COUNTYFP" "COUNTYNS" "AFFGEOID" "GEOID" "NAME"
## [7] "LSAD" "ALAND" "AWATER" "geometry"
names(demog_df)
## [1] "X" "AREA" "PERIMETER" "CNTY_" "CNTY_ID"
## [6] "NAME" "FIPS" "FIPSNO" "CRESS_ID" "BIR74"
## [11] "SID74" "NWBIR74" "BIR79" "SID79" "NWBIR79"
names(counties_sf_utm_join)
## [1] "STATEFP" "COUNTYFP" "COUNTYNS" "AFFGEOID" "GEOID"
## [6] "NAME.x" "LSAD" "ALAND" "AWATER" "X"
## [11] "AREA" "PERIMETER" "CNTY_" "CNTY_ID" "NAME.y"
## [16] "FIPSNO" "CRESS_ID" "BIR74" "SID74" "NWBIR74"
## [21] "BIR79" "SID79" "NWBIR79" "geometry"
#Plot counties with a new variable FRA: Useful for my research!!!!!!
ggplot() +
geom_sf(data = counties_sf_utm_join, aes(fill=BIR74)) +
scale_fill_gradient("BIR74",low='white',high='darkblue')
#Or with mapview
mapview(counties_sf_utm_join['BIR74'])
We can also manipulate the geometries in interesting ways. The sf cheat sheet is handy for exploring: https://github.com/rstudio/cheatsheets/raw/master/sf.pdf
#Select the triangle counties into a new sf data frame
triCo <- counties_sf_utm %>%
filter(NAME %in% c("Durham","Wake", "Orange", "Chatham"))
#Plot
myMap = ggplot() +
geom_sf(data = triCo)
myMap
#Compute the centroids and show them
triCo_centroids <- st_centroid(triCo)
## Warning in st_centroid.sf(triCo): st_centroid assumes attributes are
## constant over geometries of x
myMap <- myMap + geom_sf(data = triCo_centroids, color = 'blue')
myMap
#Buffer the centroids outward 2km and add them to our
triCo_centroids_2km <- st_buffer(triCo_centroids, 2000)
myMap <- myMap + geom_sf(data = triCo_centroids_2km, color = 'orange', fill=NA)
myMap
#Buffer the counties inward 2km
triCo_in2km <- st_buffer(triCo, -2000)
myMap <- myMap + geom_sf(data = triCo_in2km, color = 'green', fill=NA)
myMap
#Combine the centroids into one featue and construct a convex hull around them
triCo_centroids_chull <- triCo_centroids %>%
st_union() %>% #put single point into a cell of geometry
st_convex_hull()
myMap <- myMap + geom_sf(data = triCo_centroids_chull, color = 'red', fill=NA)
myMap
#Combine the centroids into one feature and draw voronoi polygons
triCo_centroids_voronoi <- triCo_centroids %>%
st_union() %>%
st_voronoi() # what is the domain of a point
myMap <- myMap + geom_sf(data = triCo_centroids_voronoi, color = 'purple', fill=NA)
myMap
#User coordinates
userLat = 36.0045442
userLng = -78.9426381
#Create a simple features point geometry from the point
theSite_sfp <- st_point(c(userLng,userLat)) #special feature point
#Create a simple features column from the point geometry object
theSite_sfc <- st_sfc(theSite_sfp, crs = 4326) #the point plus a spatial reference
#Transform the mask to match the CRS of the counties dataset
theSite_sfc_transformed <- st_transform(theSite_sfc, crs = st_crs(counties_sf_utm))
#Create a boolean mask
resultMask <- st_intersects(counties_sf_utm,
theSite_sfc_transformed,
sparse = FALSE) #The `sparse` option returns a Boolean mask
#Filter the counties dataset using the boolean mask
selCounties <- counties_sf_utm[resultMask,] #return the one that is true
#Map the results
mapView(counties_sf[resultMask,])
> Questions: how might we use the st_buffer function to show all counties within 30km of the site?
# all we do is take the coordinate and buffer it and intersect that with the counties.
For research. see thresholds for pm25 and look areas that are in dangerous zone. Paint the counties by values of pm2.5.
Maybe look at correlation with median income by county or by other thing. (or race but I would prefer not to)